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Abstract 

In this paper we propose a global optimization-based ap¬ 
proach to jointly matching a set of images. The estimated 
correspondences simultaneously maximize pairwise feature 
affinities and cycle consistency across multiple images. Un¬ 
like previous convex methods relying on semidefinite pro¬ 
gramming, we formulate the problem as a low-rank ma¬ 
trix recovery problem and show that the desired semidefi¬ 
niteness of a solution can be spontaneously fulfilled. The 
low-rank formulation enables us to derive a fast alternating 
minimization algorithm in order to handle practical prob¬ 
lems with thousands of features. Both simulation and real 
experiments demonstrate that the proposed algorithm can 
achieve a competitive performance with an order of mag¬ 
nitude speedup compared to the state-of-the-art algorithm. 
In the end, we demonstrate the applicability of the proposed 
method to match the images of different object instances and 
as a result the potential to reconstruct category-specific ob¬ 
ject models from those images. 

1. Introduction 

Finding feature correspondences between two images is 
a fundamental problem in computer vision with various ap¬ 
plications such as structure from motion, image registration, 
shape analysis, to name a few. While previous efforts were 
mostly focused on matching a pair of images, many tasks 
require to find correspondences across multiple images. A 
typical example is nonrigid structure from motion [3, 12], 
where one can hardly reconstruct a nonrigid shape from two 
frames. Furthermore, recent work has shown that leverag¬ 
ing multi-way information can dramatically improve match¬ 
ing results compared to pairwise matching [29, 16]. 

The most important constraint for joint matching is the 
cycle consistency, i.e., the composition of matches along a 
loop of images should be identity, as illustrated in Figure 1 . 
Given pairwise matches, one can possibly identify true or 
false matches by checking all cycles in the image collection. 
But there are many difficulties for this approach [10]. For 
example, the input pairwise matches are often very noisy 



Figure 1. An illustration of consistent multi-image matching. 

with many false matches and missing matches, and the fea¬ 
tures detected from different images may only have a par¬ 
tial overlap even if the same feature detector is applied [27]. 
Therefore, it is likely that very few consistent cycles can be 
found. Moreover, how to sample cycles is not straightfor¬ 
ward due to the huge number of possibilities [16]. Recent 
work on joint matching has shown that, if all feature corre¬ 
spondences within multiple images are denoted by a large 
binary matrix, the cycle consistency can be translated into 
the fact that such a matrix should be positive semidefinite 
and low-rank [18, 29, 16]. Based on this observation, con¬ 
vex optimization-based algorithms were proposed, which 
achieved the state-of-the-art performances with theoretical 
guarantees [16, 10]. But these algorithms rely on semidefi¬ 
nite programming (SDP), which is not computationally ef¬ 
ficient to handle image matching problems in practice. 

In this paper, we propose a novel algorithm for multi¬ 
image matching. The inputs to our algorithm are original 
similarities between feature descriptors such as SIFT de¬ 
scriptors [25] and deep features [35], or optimized affinities 
provided by existing graph matching solvers [22]. The out¬ 
puts are feature correspondences between all pairs of im¬ 
ages. Unlike many previous methods starting from quan¬ 
tized pairwise matches [29, 10], we postpone the decision 
until we optimize for both pairwise affinities and multi¬ 
image consistency. Instead of using SDP relaxation, we 
formulate the problem as a low-rank matrix recovery prob¬ 
lem and employ the nuclear-norm relaxation for rank mini¬ 
mization (Section 4.1). We show that the positive semidef¬ 
initeness of a desired solution can be spontaneously ful¬ 
filled (Section 4.2). Moreover, we derive a fast alternat- 
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ing minimization algorithm to globally solve the problem 
in the low-dimensional variable space (Section 5). Besides 
validating our method on both simulated and real bench¬ 
mark datasets, we also demonstrate the applicability of the 
proposed method combined with deep learning and graph 
matching to match images with different objects and recon¬ 
struct category-specific object models (Section 6). 

2. Related work 

The early work on joint matching aimed to select cycle- 
consistent matches and identify incorrect matches from bad 
cycles [41, 28]. The assumption for this family of meth¬ 
ods is that correct matches are dominant in the raw in¬ 
put. Otherwise, it is difficult to find a sufficient number 
of closed cycles [16]. Some works proposed to use the cy¬ 
cle consistency as an explicit constraint for sparse feature 
matching [38, 37, 39, 36] or pixel-wise fiow computation 
[42], but the resulting optimization problems are noncon- 
vex and can hardly be solved globally. Recent results in 
[18, 16, 29] showed that the consistent matches could be ex¬ 
tracted from the spectrum (top eigenvectors) of the matrix 
composed of all pairwise matches. The rationale behind this 
spectral technique is that the problem can be formulated as 
a quadratic integer program and relaxed into a generalized 
Rayleigh problem. But the relaxation assumes full feature 
correspondences (bijection) between images [29]. Recently, 
Huang and Guibas [16] proposed an elegant solution based 
on convex relaxation and derived the theoretical conditions 
for exact recovery. The result is further improved in [10] by 
assuming that the underlying rank of the variable matrix can 
be reliably estimated. In these works, the problem is formu¬ 
lated as SDP, which has a limited computational efficiency 
in real applications. 

Regarding methodology, our work is inspired by the re¬ 
cent advances on low-rank matrix recovery which make use 
of convex relaxation [8, 7] and explore the underlying low- 
rank structure to accelerate computation [5, 15]. Our work 
is also related to some other problems that aim to find global 
estimates from pairwise estimates such as rotation averag¬ 
ing [14, 34] and model fusion [40]. 

3. Preliminaries and notation 

Suppose we have n images and pi features from each 
image i. The objective is to find feature correspondences 
between all pairs of images. Before introducing the pro¬ 
posed method, we first give a brief introduction to pairwise 
matching techniques and the definition of cycle consistency. 

3.1. Pairwise matching 

To match an image pair (i, j), one can compute similari¬ 
ties for all pairs of feature points from two images and store 
them in a matrix Sij G . 


We represent the feature correspondences for image pair 
ihj) by a partial permutation matrix Xij G {0, ^ 

which satisfies the doubly stochastic constraints: 

0<Xijl<l, ,0<xfjl<l. ( 1 ) 

To find Xij, we can maximize the inner product between 
Xij and Sij subject to the constraints in (1) resulting in a 
linear assignment problem, which has been well studied and 
can be efficiently solved by the Hungarian algorithm. 

In image matching, spatial rigidity is usually preferred, 
i.e., the relative location between two features in an im¬ 
age should be similar to that between their correspondences 
in the other image. This problem is well known as graph 
matching and formulated as a quadratic assignment prob¬ 
lem (QAP). While QAP is NP-hard, many efficient algo¬ 
rithms have been proposed to solve it approximately, e.g., 
[22, 1, 11]. Those solvers basically relax the binary con¬ 
straint on the permutation matrix, solve the optimization, 
and output the confidence of a candidate match being cor¬ 
rect. We refer readers to the related literature for details. 
Here we aim to emphasize that the outputs of graph match¬ 
ing solvers are basically optimized affinity scores of can¬ 
didate matches, which consider both feature similarity and 
spatial rigidity. We will use these scores (saved in Sij) as 
our input in some cases. 

3.2. Cycle consistency 

Some recent work proposed to use the cycle consistency 
as a constraint to match a bunch of images [29, 37, 10]. The 
cycle consistency can be described by 




zj^ 


( 2 ) 


for any three images (i, j, z) and can be extended to the case 
with more images. 

The recent results in [16, 29] show that the cycle con¬ 
sistency can be described more concisely by introducing a 
virtual “universe” that is defined as the set of unique fea¬ 
tures that appear in the image collection. Each point in the 
universe may be observed by several images and the cor¬ 
responding image points should be matched. In this way, 
consistent matching should satisfy Xij = AiAj, where 
Ai G {0,denotes the map from Image i to the uni¬ 
verse, k is the number of points in the universe, and k > pi 
for all i. 

Suppose the correspondences for all m = 
features in the image collection is denoted by X G 

{0 l}mxm; 


X = 


Xn 

^21 


Xi2 

^22 
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and all A^s are concatenated as rows in a matrix A G 
{0 i}mx/c write X as 

X = AA^, (4) 

From (4), it is clear to see that a desired X should be both 
positive semidefinite and low-rank: 

X y 0, rank(X) < k. (5) 

Using (5) the cycle consistency can be effectively im¬ 
posed without checking all cycles of pairwise matches. 
Moreover, partial matching is allowed, while bijection 
needs to be assumed in (2). 

4. Joint matching via rank minimization 

Given affinity scores {Sij | 1 < i,j < n}, we aim to 
find globally consistent matches X. Note that Sij can be 
all-zero if matching is not performed for a pair More¬ 
over, affinity scores can be computed from either feature 
similarities or graph matching solvers according to specific 
scenarios, as described in Section 3.1. 

4.1. Formulation 

We formulate the problem as a low-rank matrix recovery 
problem. We maximize the inner product between Xij and 
Sij for all i and j as multiple linear assignment problems. 
At the same time, we minimize the rank of X to enforce 
the cycle consistency. We ignore the positive semidefinite 
constraint on X and will explain the reasons later. 

To make the optimization tractable, we make the follow¬ 
ing relaxations: (1) X is treated as a real matrix X e 
Jo ijmxm instead of a binary matrix, which is a general 
practice in solving matching problems. Experimentally, we 
found that the solution values were very close to 0 or 1 and 
could be stably quantized by a threshold of 0.5. This might 
be attributed to the existence of a linear term in the cost 
function [26]. (2) Rank of X is replaced by the nuclear 
norm ||X||* (sum of singular values), which is a tight con¬ 
vex relaxation proven to be very effective in various low- 
rank problems such as matrix completion [8] and robust 
principal component analysis [7] . 

The estimated X should be sparse since at most one 
value in each row of Xij can be nonzero. To induce spar¬ 
sity, we minimize the sum of values in X. Combining all 
three terms, we obtain the following cost function: 

n n 

f(X) = -J2 + a{l, X) + A||X|U, 

i = l j = l 

= -{S-al,X)yX\\X\U, (6) 

where (•, •) denotes the inner product and S G is the 

matrix collecting all SijS. a is the weight of sparsity, which 


can be interpreted as a threshold to remove small scores in 
SijS. In our implementation, we normalize the scores to let 
them lie between 0 and 1 and empirically set a = 0.1. A 
controls the weight of the nuclear norm. We will discuss A 
in Section 4.2 and Section 6.1.2. 

Besides the doubly stochastic constraints in (1), addi¬ 
tional constraints shall be imposed on X after relaxation: 


X. a I Pi 1 1 ^ ^ 

(7) 

^ij ^ji") ^ j-) 

(8) 

0 < X < 1, 

(9) 


where (7) constrains self-matching to be identity, (8) con¬ 
strains X to be symmetric, and (9) constrains the values in 
X to lie in [0,1]. 

Finally, we obtain the following optimization problem: 

imn {4y,X) + A||X|U, 

s.t. X G C, (10) 

where W = al — S and C denotes the set of matrices 
satisfying the constraints given in (1), (7), (8) and (9). 

Upon our experimental observation, the result doesn’t 
degrade noticeably when removing the doubly stochastic 
constraints in (1). This might be attributed to the existence 
of the sparsity regularizes Therefore, we remove (1) in im¬ 
plementation to accelerate the computation. 

4.2. Positive semidefiniteness 

We ignore the positive semidefinite constraint for two 
reasons: (1) solving SDP is generally unscalable; (2) with 
the constraints in (7) and (8), the solution to (10) turns out 
to be nearly positive semidefinite if A is sufficiently large. ^ 

Suppose cTi, • • • , dm are eigenvalues of X. From (7), 
we have Xu = 1 for all i, and — trace {X) = m, 

which implies that the sum of diS is fixed. From (8), we 
have X is symmetric, and a^s are all real numbers. When 
we choose a large A, ||X || * = \^i I dominates the cost 

function, and a solution with all nonnegative cr^s will give 
the lowest cost, because Wi\ ^ YlT=i = rn and 

the equality holds iff. di > 0 for all i. 

The boundness ||X||^ > m also implies that the solu¬ 
tion to (10) will be insensitive to A when A is sufficiently 
large, and then minimizing the nuclear norm is equivalent 
to adding a positive semidefinite constraint. The effect of A 
is experimentally illustrated in Section 6.1.2. 


^We use the term “nearly positive semidefinite” to refer to the prop¬ 
erty that the negative eigenvalues of a matrix, if there exist, are negligible 
compared to the norm of the matrix. 
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Table 1. The CPU time (seconds) for one iteration of MatchALS, 
MatchLift [10] and partial SVD [21]. n,p, and m denote the num¬ 
ber of images, the number of points per image, and the dimension 
of X, respectively. We set k = 2p for MatchALS. 

5. Fast alternating minimization 
5.1. Optimization in the low-rank space 

The nuclear norm minimization in (10) is convex and the 
state-of-the-art methods to solve this family of problems 
are the proximal method [30] or ADMM [2] based on it¬ 
erative singular value thresholding [6]. However, singular 
value decomposition (SVD) needs to be performed in each 
iteration, which is extremely expensive even for a medium¬ 
sized problem. For instance, if there are 20 images with 
500 features per image to match, we have to optimize for an 
10, 000 X 10,000 matrix. A single SVD for such a matrix 
takes hundreds of seconds on a typical PC even if a partial 
SVD solver [21] is used. See Table 1 and Section 5.3. 

Fortunately, recent results on low-rank optimization have 
shown that one can solve the problem more efficiently via 
a change of variables X = AB^ [5, 15], where A^B e 
^mxk variables with a smaller dimension k < m. 

More importantly, the change of variables will not introduce 
additional local minima if k is larger than the rank of the 
original solution. This result was originally derived for low- 
rank SDP [4, 20] but also applies here since a nuclear-norm 
minimization problem can be rewritten as SDP [31]. 

Inspired by these works, we propose the following low- 
rank factorization-based formulation in order to leverage 
the underlying low dimensionality of our problem: 

min {W,AB'^) + X\\AB'^\\^, 

s.t. AB'^ &C. (11) 

Moreover, with the following equation [31], 

||X||*= min 1{\\A\\% + \\B\\%), (12) 

we finally obtain the following formulation: 

min {W,AB^) + hA\\j. + hB\\j., 

A,B Z Z 

s.t. AB^ e C. (13) 


The selection of matrix dimension k is critical to the success 
of change of variables, while it directly affects the compu¬ 
tational complexity. We will first provide the algorithm, an¬ 
alyze its complexity and then discuss the selection of k. 

5.2. Algorithms 

The problem in (13) is not straightforward to solve due 
to the constraint on the product of variables. Instead, we 
rewrite the problem as 

min (T^,X) + ^||A||| + ^||B|||, 

X,A,B Z Z 

s.t. X = AB^, X eC, (14) 

and apply the ADMM [2] to solve (14). 

The augmented Lagrangian of (14) reads: 

iX,A,B,Y) ={W,X) + + ^\\B\\% (15) 

+{Y, X - AB^) + I ||X - AB^\\% 

where Y is the dual variable and /i is a parameter control¬ 
ling the step size in optimization. We keep the constraint 
X e C since it can be easily handled as we will show later. 
Then, the ADMM alternately updates each primal variable 
by minimizing jC^ and updates the dual variable via gra¬ 
dient ascent while fixing all other variables. The overall 
algorithm is summarized in Algorithm 1 . 


Algorithm 1: Multi-Image Matching via Alternating 
Least Squares (MatchALS) 

Input: Pairwise affinity scores S 
Output: Globally consistent matches X 

1 randomly initialize A and B,Y = 0 ; 

2 W = al-S', 

3 while not converged do 

6 X^Vc (ab'^ - i (W + r)) ; 

7 Y ^Y'^ + n (x - AB'^^ ; 

8 end 

9 quantize X with a threshold equal to 0.5. 


Minimizing over A turns out to be a regularized least 
squares problem with a closed-form solution given in Step 
4 in Algorithm 1 . The update of B can be solved similarly. 
The update of X requires to solve: 

m,n\\X - AB'^ + \ {W + Y)\\l, 


(16) 









and the solution turns out to be a projection to C. Since the 
constraints in C are all linear, the projection can be solved 
conveniently. We denote the solution hy Vc {') and leave 
the details to the supplementary material. 

5.3. Computational complexity 

The time complexity of an iteration in Algorithm 1 is 
dominated by matrix multiplication that requires 0{nn?k) 
flops^. We compare it to the state-of-the-art algorithm 
MatchLift [10], which is based on SDR The time com¬ 
plexity of an iteration in MatchLift is dominated by the 
eigenvalue decomposition that requires O(m^) flops. As m 
is much larger than k. MatchALS has a lower complexity 
compared to MatchLift. Moreover, matrix multiplication 
is parallelizable and has been inherently multithreaded in 
Matlab, while the parallelization of eigenvalue decomposi¬ 
tion is an open problem. Both Match ALS and MatchLift are 
based on ADMM and require similar numbers of iterations 
to converge upon our observation. 

The CPU time for some problem sizes is shown in Ta¬ 
ble 1 . The algorithms are implemented in Matlab and tested 
on a PC with an Intel i7 3.4GHz CPU and 8G RAM. We 
also compare the time cost of partial S VD using PROPACK 
[21], a toolbox widely used to solve large-scale matrix com¬ 
pletion problems [24]. In partial SVD, only r leading sin¬ 
gular vectors are computed, which is much faster than full 
SVD when r/m is extremely small. But it is not efficient 
for a relatively large r. In our problem, r should be larger 
than the true rank and we test partial SVD with r = p in 
Table 1. 

5.4. Selection of k and rank reduction 


be selected, which severely increases the computation. To 
address this issue, we loose the constraint in (7) to be 

trace {X) = m', 

off-diagnal values = 0, (17) 

where m' < m is a predefined constant. When m' = m, 
(17) is reduced to (7). When m' < m, we allow some 
rows and columns in X to be null, which is most likely 
to happen for the rows and columns corresponding to the 
isolated features, since “switching” them off will not lose 
many affinity scores but be able to reduce the nuclear norm 
immediately. By using such a “rank reduction” strategy, the 
algorithm can automatically prune the isolated features and 
reduce the size of universe, which enables us to select a 
smaller k for better computational efficiency. We set m' = 
m in simulation since there is no isolated feature and m' = 
0.7m in real experiments. 

6. Experiments 
6.1. Simulation 

We evaluate the performance of the proposed method us¬ 
ing synthesized data. Given a permutation matrix X and the 
ground truth X*, we measure the error rate by intersection 
over union: 


|r(X)nr(X*)| 

|r(X)Ur(X*)|’ 


(18) 


where r denotes the matches defined by a permutation ma¬ 
trix and I • I means the size of a set. 


From the previous subsections we see that k determines 
the complexity of MatchALS and k should be larger than 
the rank of true solution, i.e. the size of universe. While 
some spectral techniques have been proposed in previous 
work for rank estimation [10], we found that the estimation 
was inaccurate when the input was noisy and incomplete. 
Fortunately, our solution doesn’t depend on k if k is larger 
than the underlying true rank (demonstrated later in Fig¬ 
ure 3). A heuristic choice is to set k = 2f, where f is a 
rough estimate of the size of universe. 

In real applications, there are likely to be many isolated 
features in each image which don’t have any correspon¬ 
dence in other images. However, the constraint in (7) im¬ 
plies that every image feature must be matched to a point in 
the universe. To see this, recall that we hope X = AA^ in 
(4). If diagonal values of X are all ones, every row of A has 
a unit norm, which indicates a match to the universe. There¬ 
fore, the size of universe is dramatically increased by those 
isolated features, and consequently a very large k needs to 

^The detail is given in the supplementary material 


6.1.1 Matching errors 

We follow the settings in [10] to evaluate the performance of 
MatchALS and compare it to alternative methods. The size 
of universe is fixed as 20 points and in each image a random 
sample of the points are observed with a probability denoted 
by po. The number of images is denoted by n. Then, the 
ground-truth pairwise matches are established, and random 
corruptions are simulated by removing some true matches 
and adding some false matches to achieve an error rate of 
Pe- Finally, the corrupted permutation matrix is fed into 
Algorithm 1 as the input affinity scores. 

We evaluate the performance of MatchALS under var¬ 
ious po, pe and n. We compare MatchALS to two re¬ 
lated methods: MatchLift [10] and the spectral method [29]. 
Both of the alternative methods require to know the size 
of universe and we provide the true value r* = 20. For 
MatchALS parameters, we set k = 2r* and A = 50. 

The output error rates under various settings are shown 
in Figure 2. When the number of images is sufficiently 
large, all methods can achieve nearly exact recovery even 







Figure 2. The 2D plot of matching errors under various problem 
settings for the spectral method [29], MatchLift [10] and the pro¬ 
posed MatchALS. In the left column, the number of images n 
and the input error rate pe are varying, while the observation ra¬ 
tio po = 0.6. In the right column, po and pe are varying, while 
n = 20. Lower intensity indicates smaller error and overall a 
larger dark region indicates a better performance. 



Figure 3. The estimation error versus the input rank r and the 
weight of nuclear norm A. The true rank r* = 20. Here we set 
k — r fox MatchALS. 

if the input error rate is larger than 50%, which demon¬ 
strates the power of joint matching. MatchALS and Match- 
Lift achieve very similar performances and outperform the 
spectral method especially when the observation ratio is 
small. Compared to MatchLift, the proposed method ob¬ 
tains a competitive performance without exactly knowing 
the true rank and requires much less computation time. 

6.1.2 Sensitivity to parameters 

The sensitivity of MatchALS to the parameters in (13) is il¬ 
lustrated in Figure 3. The figure shows that MatchALS is 
insensitive to the predefined dimension of factor matrices k 
when k is larger than the true rank r*, as we explained in 
Section 5.1. When k < r*, the problem in (13) is no longer 


equivalent to the original convex problem in (10), and con¬ 
sequently the alternating minimization fails. In practice, we 
choose /c = 2f as a compromise between safety and effi¬ 
ciency. The right panel in Figure 3 illustrates that the al¬ 
gorithm is insensitive to A when A is sufficiently large as 
we explained in Section 4.2. In all our experiments, we set 
A = 50. 

6.2. Real experiments 
6.2.1 Graffiti datasets 

We evaluate the performance of our algorithm on six bench¬ 
mark datasets from the Graffiti datasets^. In each dataset, 
there are six images of a scene with various image transfor¬ 
mations such as viewpoint change, blurring, illumination 
variation, etc. 

We detect 1000 affine covariant features [27] with SIFT 
[25] descriptors from each image using the VLFeat library 
[32]. For each image pair (i, j), we compute the inner prod¬ 
ucts between feature descriptors as affinity scores and only 
keep the scores larger than 0.7 and collect them in Sij . If 
the ratio between the first and the second largest scores in 
a row/column is smaller than 1.1, we set all scores in this 
row/column to be zero in order to remove indistinctive fea¬ 
tures. After computing all Sij, we remove the features that 
have candidate matches in less than two images since they 
have no contribution to joint matching. Finally, we input the 
affinity scores to Algorithm 1 to obtain the optimized joint 
matches. 

For evaluation, we adopt the metric used in [10]: for a 
testing point in an image, we calculate the distance between 
its estimated correspondence and the true correspondence 
in another image. If the distance is smaller than a threshold, 
we regard that a correct match is found for this testing point. 
Then, we plot the percentages of testing points with cor¬ 
rect matches versus the threshold values and obtain a curve 
analogous to a precision-recall curve. If a testing point is 
not aligned with any detected point, its estimated correspon¬ 
dence is obtained by interpolation. In this experiment, we 
use all detected feature points in the first image as testing 
points and evaluate the matches from the first image to the 
other five images. True correspondences are computed from 
the homography matrices provided in the datasets. 

The performance curves on three datasets are shown in 
Figure 4. A curve closer to the upper-left corner indicates 
a better performance. The area under curve and computa¬ 
tion time for all datasets are summarized in Table 2. All of 
the joint matching methods achieve obvious improvements 
compared to the original pairwise matching. MatchALS 
and MatchLift perform similarly and outperforms the spec¬ 
tral method, which coincides with the observation in sim¬ 
ulation. Regarding computation time, MatchALS achieves 

^ http: //WWW. robots. ox. ac.uk/ vgg/data/ data-aff.html 
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Figure 4. The performance curves on the Graff, Bikes and Light datasets. The y-axis shows the percentages of correct matches. The x-axis 
shows the distance threshold over the image width. Please see Section 6.2.1 for details. Four methods are compared: MatchALS, MatchLift 
[10], the spectral method [29], and the original pairwise matching. The areas under curves for all six datasets are given in Table 2. 



Figure 5. The matches between the 1st and the 4th images on the Graff, Bikes and Light datasets. Best viewed in color. The true matches 
and false matches are shown in yellow and blue, respectively. The top and bottom rows correspond to the results of pairwise matching and 
joint matching by MatchALS, respectively. 
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MatchALS 

Spectral 

MatchLift 

Graffiti 

60.2% 

87.3% 

75.6% 

80.6% 

Bikes 

76.8% 

94.3% 

86.7% 

92.5% 

Boat 

86.2% 

93.9% 

87.7% 

91.7% 

Light 

76.0% 

93.9% 

90.0% 

94.0% 

Bark 

71.7% 

92.2% 

91.2% 

90.0% 

UBC 

88.0% 

97.0% 

92.9% 

96.8% 

Time 

- 

85.8 

86.4 

2518.4 


Table 2. The matching scores and the average computation time 
(seconds) on the Graffiti datasets. The score is calculated as the 
area under the curve shown in Figure 4. 


a remarkable speedup (^30 times on average) compared to 
MatchLift. 

We select three image pairs to visually demonstrate the 
effect of joint matching in Figure 5. A match with a devia¬ 
tion less than five pixels from the ground truth is declared as 
true. Clearly, the joint matching can prune the false matches 
(fewer blue lines), complete some missing matches (denser 
yellow lines), and achieve almost correct matching for these 
image pairs with large disparities in viewpoints, blurring 
and illumination changes. 


6.2.2 Matching different objects 

Recent years have witnessed growing interest in recon¬ 
structing category-specific object models from single im¬ 
ages, which is still an open problem [33, 9]. Among a se¬ 
ries of challenges, feature matching for different object in¬ 
stances is the foremost and previous work usually assumed 
that correspondences of some keypoints were given [33, 9]. 
In this section, we demonstrate the applicability of joint 
matching to solve this problem. 

We use the FG3DCar datasets [23] and try to match the 
images of different car models in the same category (e.g., 
sedan or SUV). Following the general practice in object re¬ 
construction [33, 9], we assume segmentation is provided 
such that background can be ignored, and we only match 
images with similar views. We select nine sedans and eight 
SUVs and match two sets of images separately. Note that 
the car models are all different from each other. See Fig¬ 
ure 6 for examples. 

To exact descriptive features we first detect image edges 
by the structured forests [13] and sample a number of points 
on the edges with constant spacing. On average, we ob¬ 
tain ^600 feature points for each image. Since the object 
appearance is changed from image to image and the fea¬ 
tures are automatically extracted, the matching is extremely 
difficult. Inspired by recent works [35, 9], we adopt deep 
















































Figure 6. Matching different cars. Best viewed in color. Left: the correspondences of sedan images and the reconstruction. Right: the 
correspondences of SUV images and the reconstruction. Only four selected images are shown for each image set. Note that the cars in 
images are all different and the feature points are automatically detected. The markers with the same color indicate the matched feature 
points. The 3D reconstruction is rendered with the colors in the first image and visualized in two viewpoints. 



Threshold 


Figure 7. The performance curve of car image matching. “Deep” 
represents deep features. “GM” denotes graph matching. “Joint” 
means joint matching using the proposed method. 

features, i.e., middle-layer responses of convolutional neu¬ 
ral nets (CNN), as descriptors for feature matching. More 
specifically, we use the publicly available deep learning 
toolbox Caffe [17] and the pre-trained CNN Alexnet [19]. 
We feed a 192 X 192 patch around each feature point for¬ 
ward through the Alexnet. The center columns of conv4 
and conv5 layers are concatenated and normalized to form 
a 640 dimensional feature vector. To leverage the prior on 
object rigidity, we use pairwise graph matching solved by 
the Reweighted Random Walk algorithm [11] and collect 
the output scores of candidate matches as affinity scores. 
Then, we delete the points with candidate matches in less 
than two images and run MatchALS. 

We adopt the same metric introduced in Section 6.2.1 
for quantitative evaluation and use the manually-annotated 
landmarks provided in the datasets as ground truth. The re¬ 
sult on the sedan images is shown in Figure 7. Matching 
with SIFT features fails since local image patterns are dif¬ 
ferent for two cars. Graph matching with deep features ob¬ 
tains a much better performance, which is further improved 
by the proposed joint matching algorithm. We obtain a very 
similar result on the SUV images, which is not plotted. 


The results are visualized in Figure 6. The correspond¬ 
ing parts of cars are basically matched in spite of the large 
differences in appearances and viewpoints. Note that the 
features are automatically detected and therefore not fully 
overlapped for two images. For a simple demonstration, we 
run rigid reconstruction from the estimated feature corre¬ 
spondences by triangulation with an orthographic camera 
model and the viewpoints provided in the dataset. Despite 
some noises and missing points, we can clearly see the 3D 
structures of a sedan and a SUV. We believe that more ap¬ 
pealing reconstructions can be obtained by using sophisti¬ 
cated reconstruction techniques and more information such 
as object silhouette and surface smoothness, while they are 
out of the scope of this paper. 

7. Conclusion 

In this paper, we proposed a practical solution to multi¬ 
image matching. We use pairwise feature similarities or 
graph matching scores as input and obtain accurate matches 
by an efficient algorithm that globally optimizes for both 
feature affinities and cycle consistency of matches. The 
experiments not only validate the effectiveness of the pro¬ 
posed method but also demonstrate that joint matching is a 
promising approach to matching images with different ob¬ 
ject instances as the first step towards reconstructing object 
models from crowd-sourced image collections. As future 
work, we would like to explore more applications and in¬ 
cremental algorithms for joint matching. 
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